Missing Value Estimation Based on Dynamic Attribute Selection
نویسندگان
چکیده
Raw Data used in data mining often contain missing information, which inevitably degrades the quality of the derived knowledge. In this paper, a new method of guessing missing attribute values is suggested. This method selects attributes one by one using attribute group mutual information calculated by flattening the already selected attributes. As each new attribute is added, its missing values are filled up by generating a decision tree, and the previously filled up missing values are naturally utilized. This ordered estimation of missing values is compared with some conventional methods including Lobo's ordered estimation which uses static ranking of attributes. Experimental results show that this method generates good recognition ratios in almost all domains with many missing values.
منابع مشابه
Missing Value Imputation Method Based on Density Clustering and Grey Relational Analysis
In the computer-aided medical diagnosis, the problem of missing attribute values in many medical data sets brings a great challenge to data mining. To solve the problem, this paper proposes a method based on density clustering and grey relational analysis. It provides an effective solution for missing medical data. The method uses the characteristic and degree of data samples dynamic relation a...
متن کاملPredicting Missing Attribute Values Using k-Means Clustering
Problem statement: Predicting the value for missing attributes is an important data preprocessing problem in data mining and knowledge discovery tasks. Several methods have been proposed to treat missing data and the one used more frequently is deleting instances containing at least one missing value of a feature. When the dataset has minimum number of missing attribute values then we can negle...
متن کاملMissing-value estimation using linear and non-linear regression with Bayesian gene selection
MOTIVATION Data from microarray experiments are usually in the form of large matrices of expression levels of genes under different experimental conditions. Owing to various reasons, there are frequently missing values. Estimating these missing values is important because they affect downstream analysis, such as clustering, classification and network design. Several methods of missing-value est...
متن کاملModified Deviation Approach to Deal with Missing Attribute values in Data Mining with Different percentage of Missing Values
Information System having missing attribute values (in practical) hampers accurate estimation of Data Mining. If missing attribute values can be predicted in the pre-processing stage of data mining then it will help to improve the accuracy, and the existing data mining algorithms can also be applied based on complete data. In this work different type of methods available to handle incomplete in...
متن کاملOutlier Removal in Model-Based Missing Value Imputation for Medical Datasets
Many real-world medical datasets contain some proportion of missing (attribute) values. In general, missing value imputation can be performed to solve this problem, which is to provide estimations for the missing values by a reasoning process based on the (complete) observed data. However, if the observed data contain some noisy information or outliers, the estimations of the missing values may...
متن کامل